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INTRODUCTION 


Considerable  effort  has  been  spent  by  the  Naval  Personnel 
Research  and  Development  Center  (NPRDC) ,  to  develop  a  model 
that  would  enable  the  Navy  to  forecast  future  states  of  the 
enlisted  force  structure.  This  model,  entitled  FAST,  (see  [2], 

[4]  and  [5])  is  a  highly  comprehensive  model  that  involves 
acquisitions,  losses,  and  advancements  as  well  as  a  large  number 
of  subcategories  of  these  variables  of  the  Navy  personnel  force. 

FAST  has  been  used  successfully  in  the  past  few  years  as  a 
long-range  planning  tool  as  well  as  for  researching  the  behavior 
of  the  enlisted  force.  Due  to  the  complexity  of  the  model  its 
operation  requires  a  large  amount  of  data  processing  and  computer 
time . 

In  an  attempt  to  increase  the  flexibility  of  FAST,  this  research 
effort  concentrated  on  a  single  variable  of  the  personnel  force: 
losses.  Since  forecasting  future  losses  is  one  of  the  major  tasks 
of  FAST,  it  was  considered  important  to  attempt  to  simplify  that 
single  aspect  of  FAST. 

II.  THE  FORECASTING  PROBLEM 

The  enlisted  Navy  force  is  organized  and  managed  along  the 
lines  of  ratings,  that  is,  job  skills  within  the  Navy.  Consequently, 
the  job  of  forecasting  losses  must  be  done  for  each  rating  indi¬ 
vidually.  In  addition,  losses  categorized  by  length  of  service 
and  pay  grade  simultaneously  are  preferred,  so  that  the  effects 
of  projected  losses  on  the  force  structure  can  be  forecast  as  well. 
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When  all  of  the  above  variables  are  considered  simultaneously, 
the  population  of  individuals  being  considered  is  greatly 
diminished.  For  example,  while  the  number  of  E-5's  with  15 
years  of  service  may  be  several  hundred,  the  number  of  Electronic 
Technicians  who  are  E-5  with  15  years  service  is  slight. 

This  problem  of  sparse  data  makes  the  task  of  accurate  fore¬ 
casting  difficult.  Procedures  for  forecasting  are  all  predicated 
on  some  statistical  stability  in  people's  actions.  This  stability 
comes  about  with  large  populations  of  individuals  whose  reactions 
are  similar.  With  the  small  populations  that  are  inherent  in 
sparse  data,  the  consequent  lack  of  statistical  stability  makes 
reliable  forecasting  difficult  at  best. 

To  help  overcome  the  problems  caused  by  sparse  data,  the 
populations  can  be  recombined  to  form  fewer  groups  of  larger 
sizes.  A  natural  choice  for  this  combination,  or  pooling  of  data, 
is  along  the  lines  of  ratings.  That  is,  if  ratings  which  exhibit 
similar  loss  behavior  statistically  are  identified  and  grouped, 
or  clustered  together,  the  resulting  clusters  can  be  used  in  place 
of  ratings  to  gain  some  statistical  stability.  The  pooling  of  data 
in  clusters  of  ratings  is  sought  only  to  improve  the  estimates  of 
loss  characteristics  and  of  certain  parameters  in  statistical 
models.  The  forecasting  of  losses  for  each  rating  can  still  be 
accomplished.  This  then  is  one  reason  for  finding  clusters  of 
Navy  ratings  which  exhibit  similar  loss  behavior.  Other  applica¬ 
tions  of  the  clustering  would  be  to  identify  groups  of  ratings 
to  which  common  policies  regarding  loss  and  retention  might  be 
applied.  The  following  sections  of  this  report  describe  approaches 
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to  identifying  the  clusters  and  a  procedure  for  estimating  their 
possible  effectiveness  in  improving  forecasts. 

For  the  purpose  of  our  analysis,  losses  were  defined  to  include 
losses  for  all  reasons,  from  all  pay  grades  and  length  of  service 
cells.  Actual  prediction  of  losses  is  more  complex,  involving 
many  variables,  as  described  in  [ 2  ]  and  [ 4  ] . 

III.  HIERARCHICAL  CLUSTERING 

A  x>mmon  technique  for  clustering  is  the  Hierarchical  clustering 
method.  We  will  give  a  brief  description  of  the  method  here,  Ref 
[1]  provides  more  details. 

The  hierarchical  clustering  approach  groups  objects,  in  our 
case  Navy  ratings,  into  several  sets  of  clusters,  each  one  contained 
in  the  previous  one.  Figure  1  shows  a  small  example  of  the  result 
for  5  objects. 

The  tree  structure  in  Figure  1,  called  a  dendrogram,  indicates 
how  this  procedure  formed  the  groups  of  clusters.  The  order  shown 
here  is  not  unlike  the  groupings  which  occur  in  biological  taxonomy, 
where  all  life  forms  are  grouped,  first  into  species,  then  into 
genera,  then  into  families,  and  so  on.  This  method  may  appropriately 
be  called  numerical  taxonomy. 

The  dendrogram  in  Figure  1  shows  the  5  individual  objects  being 
grouped  into  two  groups,  objects  1  and  2,  and  objects  3,  4,  and  5. 

This  is  the  first  grouping  beyond  the  base  level  of  5  singleton 
groups.  A  more  coarse  grouping  brings  all  5  objects  into  a  single 
set.  The  distance  scale  provides  a  measure  of  selectivity  in  forming 
the  groups.  If  the  "distance"  allowed  between  objects  to  be  clustered 
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Figure  1:  A  Dendrogram  for  Hierarchical  Clustering 
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together  is  10,  then  just  two  groups  are  formed.  This  criterion 
must  be  increased  to  90  before  the  first  two  groups  become  one, 
thus  indicating  that  the  cluster  of  two  groups  is  probably  natural, 
while  a  clustering  into  one  group  is  probably  not.  The  interpre¬ 
tation  of  what  groupings  are  natural  is  somewhat  subjective  if 
based  only  on  the  dendrogram.  As  described  later,  the  clusters 
in  this  application  are  evaluated  apart  from  the  dendrogram. 

In  order  to  produce  a  dendrogram,  a  "distance"  between  each 
pair  of  objects  must  be  specified.  In  this  application,  the  objects 
are  enlisted  Navy  ratings,  and  the  distance  between  two  ratings  should 
measure  the  proximity  of  their  loss  behavior.  The  distance  function 
chosen  for  this  purpose  is 


where 


d  (k  ,m) 


i=l 


p7  1  ( £  .  ) 

i,k  i,m 


1/2 


d (k,m) 

=  distance 

between  rating 

k 

and 

m 

II 

■H 

loss  rate 

from  rating 

k 

in 

year 

i 

£i,m 

loss  rate 

from  rating 

m 

in 

year 

i 

p  is  a  parameter,  0<p£l 


and  years  are  indexed  with  1966  for  i  =  1,  1967  for  i  =  2,..., 1972 
for  i  =  7.  These  years  are  being  used  simply  because  they  comprise 
the  data  base  for  the  research  project.  The  parameter  p  is  in¬ 
cluded  to  weigh  the  recent  years  greater.  Thus,  two  ratings  are 
judged  "close"  by  this  criterion  if  their  loss  rates  are  close, 
especially  in  recent  years.  The  specific  value  for  the  parameter 
p  ''remains  to  be  determined  by  the  methods  discussed  in  a  later 


section. 
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Once  a  distance  between  ratings  has  been  defined,  it  is 
necessary  to  define  a  distance  function  between  subsets  of 
ratings.  This  is  necessary  for  the  hierarchical  clustering 
algorithm  to  be  defined.  While  many  definitions  of  distance 
between  subsets  are  possible,  two  were  investigated  and  one 
finally  used.  The  "maximum  metric"  is  defined  to  be  the  maximum 
of  all  distances  between  pairs  of  objects,  one  choosen  from  each 
subset.  If  and  are  two  subsets  of  ratings,  we  have 

dmax^Cl'C2*  =  Max{d (k,m) | keC^  ,meC2}  . 

The  "minimum  metric"  is  analogously  defined,  with  MIN  replacing 
MAX  in  the  above  definition. 

Under  the  maximum  metric,  two  subsets  of  ratings  are  close 
only  if  all  ratings  are  close  to  each  other.  The  minimum  metric 
only  requires  that  two  ratings  in  the  subsets  be  close,  while 
others  may  be  distant,  for  the  subsets  to  be  close.  These  two 
definitions  generate  strikingly  different  dendrogram  shapes  as 
illustrated  later. 

CLUSTERING  BY  CORRELATION 

1.  Correlating  Population  Size  and  Corresponding  Loss  Rate. 
Examination  of  the  data  on  population  sizes  and  loss 
rates  in  various  ratings  over  the  years  1966-72  suggested  that 
ratings  may  be  grouped  on  the  basis  of  whether  their  population 
size  correlates  positively  or  negatively  (and  to  what  extent) 
with  their  corresponding  loss  rate. 
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For  example,  it  appears  that  some  ratings,  such  as  Quarter¬ 
master  (200  QM) ,  have  their  loss  rate  increase  (or  decrease) 
together  with  their  population  size  over  the  years  1966-72.  At 
the  same  time,  other  ratings,  such  as  Construction  Recruit  (6000 
CR) ,  have  their  population  size  and  loss  rate  tend  (in  most  cases) 
in  opposite  directions  from  one  year  to  the  next. 

The  correlation  between  population  size  and  loss  rate  was 
studied  for  all  ratings  and  "All  Navy"  over  the  seven  data  points, 
provided  by  the  years  1966-72.  In  addition  to  measuring  the 
correlation  directly  for  these  data  points,  rank  correlation  was 
also  used,  since  the  actual  magnitude  of  the  changes  in  population 
size  seemed  both  unimportant  and  incongruous  when  compared  to  changes 
in  the  loss  rate. 

Two  different  rank  correlation  coefficients  were  used.  These 
(see  [1])  are  defined  below  in  terms  of  the  rankings,  P^,...,P^, 
of  the  seven  population  sizes,  over  the  years  1966-72,  of  a  given 
rating  and  the  rankings  of  the  seven  corresponding 

loss  rates. 


(i)  Spearman's  Rho: 

Let  Di  =  Pi  “  ai  '  i  =  1 , . . . , 7 

be  difference  in  the  rankings. 


Then  p  =  i  -  1  l  D2 

i=l  1 

(ii)  Kendall's  Tau: 


+1 

if 
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-1 

if 

(Pi-Pj)  (£i-£  .) <0 

i/j  = 


Let 
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ID 
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t  •  •  •  / 
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Then 


T  ~  JT  l  l  Aij 
l£i< j£7  13 


(iii)  Ordinary  Correlation  Coefficient: 

If  and  denote  the  actual  magnitude  of  the  population 

sizes  and  corresponding  loss  rates  respectively  of  a  rating  over  the 
years  1966-72,  the  correlation  coefficient  is  defined  as 


where 


r  = 


P 


7 

l  (P,-P)  (£•-£) 
i=l  L  1 

“7  7172 

l  (p±-p)  l  (M ) 2 

i=l  1  i=l 


4  T  P.  and  £  =  i  J  £ . 
7  . L.  1  7  . L.  1 


Each  of  these  correlation  coefficients  provides  a  method  of 
clustering  of  ratings.  Kendall's  Tau  seemed,  perhaps,  the  most 
accommodating  in  providing  clusters  that  separate  in  a  somewhat 
natural  way.  Thus,  three  clusters  may  be  formed  on  the  basis 
of  the  values  of  Kendall's  Tau: 


(i) 

Ratings 

with 

O 

O 

• 

t-H 

1 

£ 

T 

£  - 

0.13 

(Cluster 

A) 

(ii) 

Ratings 

with 

-0.13 

< 

T 

<  + 

0.50 

(Cluster 

B) 

(iii) 

Ratings 

with 

+  0.50 

£ 

T 

£  + 

1.00 

(Cluster 

C) 

Table  1  shows  a  histogram  of  loss  rates  for  ratings  against  their 
x-values.  Each  of  the  three  clusters  may  be  broken  into  further 
subclusters  in  various  ways  based  on  the  loss  rates  of  the  ratings 
in  each  cluster.  Such  methods  are  suggested  in  the  next  subsection. 

2.  Correlating  Loss  Rates  with  All  Navy  Population  Size. 

If  the  above  procedure  for  clustering  ratings  is  to  be 
useful  it  should  provide  a  procedure  for  forecasting  future  loss 
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rates  through  the  use  of  clusters.  Since  the  above  clusters  are 
obtained  by  correlating  loss  rates  of  ratings  with  the  corresponding 
population  sizes,  one  would  have  to  have  reasonably  accurate  esti¬ 
mates  of  future  population  sizes  in  each  rating  in  order  to  fore¬ 
cast  corresponding  loss  rates  (and  then  actual  losses) .  It  seems 
unlikely  that  such  estimates  would  be  available  for  each  rating 
and  certainly  not  several  years  in  advance.  If  good  estimates 
of  population  si?es  will  be  available  for  future  years  at  all 
it  will  be  for  "All  Navy"  only.  For  that  reason,  it  appears 
desirable  to  correlate  loss  rates  of  ratings  with  "All  Navy"  popu¬ 
lation  size.  The  three  correlation  coefficients  defined  above 
are  again  relevant  with  the  only  change  that  P^,...,P^  now  denote 
the  "All  Navy"  population  sizes,  or  their  rankings,  over  the  years 
1966-72.  Table  2  presents  the  lists  of  ratings  in  three  clusters 
formed  on  the  basis  of  Kendall's  Tau.  The  three  clusters  are: 
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All  three  of  these  clusters  may  be  considered  too  big  and  in  any 
case  loss  rates  of  ratings  within  each  cluster  vary  widely.  Since 
clusters  are  envisioned  as  groups  of  ratings  of  like  loss  rates 
it  is  necessary  to  break  each  of  the  above  clusters  into  further 
subclusters.  (The  same  remark  applies  when  clustering  is  accom¬ 
plished  based  on  correlating  each  loss  rate  with  its  own  population 
size. ) 

Further  subclusters  may  be  formed  by  selecting  one  of 
several  candidate  statistics,  such  as: 
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22.00 

30.21 

26.92 

30. 33 

-0.14 

LOSS  RATES 

OF  CLUSTER  B 

RATINGS 

1966 

1967 

1968 

1969 

1970 

1971 

1972 

•  T  AU 

602 

GMT 

gunners  mate  (technician! 

14.16 

21.54 

18.35 

I  5.68 

21.98 

19.75 

23.67 

-0.05 

0 

ALL  NAVY 

U.OO 

20.94 

25.  69 

26.^6 

34 . 1  5 

3  2  •  3  8 

TfiTSTT 

“-T.U5 

1010 

OS 

OATA  SVSTFMS  TECHNICIAN 

20.  70 

18.14 

41.94 

9.52 

13.23 

12.27 

13.75 

-0.05 

24<»0 

SH 

SHIPS  serviceman 

16.4  3 

27  .94 

2  8.94 

33.13 

3  7.  9  3 

34.24 

30.  56 

0.05 

7600 

PH 

rhctograrhers  mate 

19.04 

24.23 

2  6.44 

21. 84 

32.04 

28.52 

25.20 

0.05 

6900 

AM 

AVIATICN”  STRUCTURE  krCiU4) 

15.31 

19.04 

TT.V5 

TSTVJ 

25.35 

71.5  9 

70.95 

TTCTT 

6800 

Af 

AVIATICN  ELECTRICIANS  MATF 

17 .84 

20.01 

21.  54 

18.99 

25.42 

23.73 

20.91 

0.05 

8000 

MM 

HOSRtTAL  CORPSMAN 

19.  75 

21 .76 

1  9.67 

19.80 

32.98 

24  .98 

22.95 

0.05 

3800 

EN 

ENG I  NEMAN 

1R.  16 

2Q.9C 

27.14 

27.il 

36.  99 

30.23 

32.65 

0.05 

4600 

RM 

PAtT  ERNMAKf TT 

17 .  SO 

21.43 

3  3.  88 

"IT.  SI 

37.61 

30. Y3 

- 7T7WJ 

U75T~ 

7  300 

A  K 

AVIATION  SrOREKEEPFk 

19.  72 

21  .HR 

21. 70 

22.28 

30.68 

32.02 

19.00 

0.14 

3900 

MP 

MACMtNEFY  REPAIRMAN 

19.  74 

30 . 36 

30.  66 

29.94 

36.  93 

29.09 

33.53 

0.14 

7000 

RP 

AIFCkFM  SURVIVAL  PgUIRMAN 

15.57 

20.03 

16.50 

16.37 

22.88 

22.63 

19.81 

0.14 

1  701 

LN  "* 

IECALMAH 

12.35 

12.52 

19.  n 

TTTub 

46.75 

T7.T7 

T574V 

TTIT 

soo 

TM 

TORPCOOMANS  MATE 

12.77 

22.  77 

2  1.97 

21.19 

25.77 

21.59 

2  3-32 

0.14 

6600 

AO 

AVIATICN  OkOANCCMAN 

IR.24 

22.7/ 

21. 29 

20.23 

29.05 

23.53 

22.  56 

0.  14 

2  700 

PC 

rostal  clerk 

24.98 

37.05 

38.91 

44. 08 

53.  77 

42.12 

40.23 

0.14 

iloo 

MM 

MACHINISTS  mate 

I7.il 

2  4.34 

25.48 

2S.5T 

7T.T9 

2  5.17 

73.  $0 

T.T* 

2290 

cs 

COMMl SSARYMAN 

14. 4* 

2  3.04 

22.67 

24.92 

29.64 

24.28 

26-00 

0.14 

260  0  . 

JO 

journalist 

25.88 

34.21 

32.02 

33.94 

41.72 

4  1.68 

38.09 

0.14 

3  300 

MU 

MUSICIAN 

19.27 

21  .63 

I  3.89 

14.29 

32.56 

24.45 

18.17 

0.  14 

600 

CM 

GUNNERS  MAT  3l 

17.67 

25.76 

25.38 

77777 

35.39 

2  8. 09 

— Tvm — 

0.1V 

3100 

LI 

l ITHOGRAPHER  ! 

30.4  7 

37.89 

34.43 

33.91 

47.55 

34.43 

39.07 

0.14 

6  6  00 

AC 

AIR  C  )NTRCL**AN 

14.02 

21  .64 

19.26 

17.44 

26.  59 

25.14 

21.59 

0.14 

4700 

ML 

MCULOFk 

12.65 

26.  25 

24.  89 

29.91 

26.22 

24.02 

28.51 

0.24 

4200 

ic 

INTFRIAP  C  T  MkUNI  CAT  I  CN  FlTTT 

18.79 

7776V 

27.44 

75.95 

37.  10~~ 

24.81 

— FJ.TTO 

T.TV 

1200 

OM 

OPTlC ALMAN 

16.53 

25.26 

26.01 

24.62 

26.  70 

21.29 

24—07 

0.24 

100 

EJM 

BOATSWAINS  MATE 

17.  77 

29.55 

33.36 

37.96 

42.57 

33.46 

30.18 

0.24 

810 

P  T 

MISSILF  TFfHMCUN 

4.90 

7  .  76 

I  1.91 

17.94 

17.  71 

10.85 

10.42 

0.24 

LOSS  RATES 

OF  CLUSTER  C 

RATINGS 

1 

1966 

1967 

1968 

1969 

1970 

1971 

1972 

•  TAU 

200 

CM 

O'jAR  TFRMASTEP 

22.85 

31.67 

28.06 

34.12 

36.  17 

32.76 

31.19 

0.3)  j 

l  90  n 

OP 

OATA  PRTi CESSING  TECHNICIAN 

21  .02 

25.47 

22.55 

24.  75 

35.  39 

23.75 

mr 

0.33 

7  10O 

AG 

AFKOGR APFtkS  MATE 

15.65 

24.15 

21.36 

21.10 

2  7.74 

25.4? 

20-36 

0.)) 

7400 

A  l 

AV.  MAlMT.  ADM  M  STRATKN 

27. 2H 

J2  .16 

30.37 

29.47 

39.06 

40.72 

24.  55 

0.33 

2  000 

SK 

STfRFKEFRFn 

L7. 20 

25.25 

26.  87 

28.  74 

35.74 

2  7.48 

24.93 

0  .)) 

I  500 

RM 

R  AOI OMAN 

17.79 

22.99 

22.96 

26.45 

28  •  $9 

22.95 

24.24 

0.33 

4100 

PM 

FLFC  TP  If  I ANS  MATE 

1  7.78 

27  *  10 

27.12 

26*81 

60.51 

2  5.66 

27.08 

f.H 

4000 

8T 

Bill ERMAN I  2  1 

20.  33 

30.38 

27.72 

31.64 

32.95 

26.3/ 

31.01 

0.33 

67  CO 

A8 

AVIATION  BOATSWAINS  *  AT  E 1 4 1 

21  .89 

32  .68 

29.43 

2  7.69 

3  7.  20 

35.50 

22.43 

0.  )3 

260 

SM 

s  ignalman 

19.35 

27.58 

27.13 

29.81 

31.54 

25.80 

27.36 

0.43 

2100 

OK 

OISfltWSlNG  CttPK 

16.  33 

26.76 

29.53 

30.99 

30.56 

24.60 

24.37 

0.43 

7200 

TO 

traoevman 

11.02 

15.40 

19.81 

19.04 

25.05 

13.66 

12.23 

0.43 

1800 

RM 

repsonnflman 

20.31 

25.19 

25.91 

30.20 

31.86 

25.61 

22.  19 

0.43 

4500 

DC 

oamage  control 

20.41 

2  8  .94 

24.61 

32.27 

61.86 

29.09 

T7769 

~5.T2 

400 

ST 

SONAR  TfCHtrl  C  I  ANS<  3» 

17.01 

23.52 

20.  83 

26.32 

27.75 

15.76 

10.10 

0-62 

1000 

ET 

ELECTRONICS  TI C HN IC I AN S I 3 ) 

18.34 

23.74 

24.01 

24.21 

25.60 

l  3.97 

13.69 

0.71 

800 

FT 

FIRE  CCNTRCL  TFCHN I C I ANS 14 ) 

19 .12 

26.18 

22.26 

25.25 

2  7.  72 

1  8.55 

14.01 

0.90 

Table  2 
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(i)  The  mean  loss  rate  of  ratings  over  the  seven  years; 

(ii)  The  median  loss  rate  of  ratings  over  the  seven  years; 

(iii)  The  mean  or  median  loss  rate  of  ratings  over  the  last 
three  years  only; 

(iv)  The  loss  rate  of  ratings  of  the  last  year  only. 

For  demonstration  purposes,  one  of  these  statistics,  namely 

the  median  loss  rate  of  ratings  over  the  three  years  1970-72,  was 

selected.  Figure  2  shows  each  of  the  ratings  (and  "All  Navy") 

represented  by  its  median  loss  rate  over  the  years  1970-72.  The 

three  clusters  referred  to  above  are  separated  in  the  graph.  The 

graph  itself  suggests  further  subclusters  based  on  the  size  of  the 

loss  rates.  For  example.  Cluster  A  may  be  grouped  in  four  sub- 

clusters  based  on  the  median  loss  rate  Z.  J  of  (ii)  : 

i 


(a) 

Ratings 

in 

Cluster 

A 

with 

0%  £ 

£i(m)  £  20% 

(Ax) 

(b) 

Ratings 

in 

Cluster 

A 

with 

20%  < 

S,i(m)  <;  27% 

(A2) 

(c) 

Ratings 

in 

Cluster 

A 

with 

27%  < 

Z. (m)  <  33% 
1 

(A3 ) 

(d) 

Ratings 

in 

Cluster 

A 

with 

33%  < 

<;  ioo%  (a4) 

Similar  subclusters  may  be  formed  within  Clusters  B  and  C.  These 
are  indicated  in  Figure  2  by  vertical  lines  drawn  as  boundaries 
between  neighboring  subclusters. 

Shortcomings  of  this  method  are  that  it  is  quite  "ad  hoc"  in 
selecting  the  boundaries  between  clusters  and  subclusters.  Also, 
since  at  the  start  clusters  are  formed  based  on  values  of  the 
correlation  coefficients,  ratings  of  similar  losses  may  be  found 
in  separate  clusters.  Thus,  e.g.  many  ratings  in  Cluster  C  have 
loss  rates  closer  to  those  of  some  ratings  in  Cluster  B  than  those 
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of  ratings  in  their  own  subcluster.  This  may  be  regarded  as  a 
disadvantage  if  one  considered  it  an  overriding  necessity  to 
cluster  by  like  loss  rates.  On  the  other  hand,  ratings  with  similar 
loss  rates  may  be  placed  in  different  clusters,  because  these  loss 
rates  may  be  tending  in  opposite  directions  over  the  years.  It 
may  be  desirable  in  such  cases  to  group  such  ratings  separately 
despite  their  like  loss  rates. 

Because  of  the  ad  hoc  nature  of  this  clustering  method  it 
was  not  used  in  the  rest  of  this  research  effort. 

V.  EVALUATION  OF  HIERARCHICAL  CLUSTERS 

The  methods  described  above  lead  to  various  clusterings  or 
partitions  of  the  enlisted  ratings.  In  this  section,  we  describe 
how  any  such  partition  was  evaluated. 

Let  the  set  of  enlisted  ratings  be  designated  S,  where 

S  =  {1,2, ... ,N} 

and  N  is  the  number  of  ratings  being  considered.  In  our  case, 

N  =  71  ratings.  The  total  number  of  individual  ratings  is  about 
130,  however  some  of  the  130  are  service  ratings  which  support  a 
general  rating.  In  these  instances,  several  service  ratings  con¬ 
tain  men  specializing  in  a  similar  area,  usually  at  the  middle 
paygrades  such  as  E4  to  E6  or  E7 .  A  single  general  rating  associated 
with  these  service  ratings  contains  all  men  at  the  pay  grades  beyond 
those  of  the  service  rating,  in  the  common  area.  The  general  rating 
then  contains  the  foremen  and  line  managers  for  the  men  in  the 
service  ratings.  When  this  occured,  all  the  service  ratings  and 
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its  associated  general  rating  were  combined  into  a  pseudo  rating 
for  the  analysis.  This  avoided  having  ratings  with  only  a  few 
pay  grades.  The  common  technical  skill  areas  of  these  ratings 
made  their  prior  combination  seem  natural,  and  reduced  the  number 
of  ratings  analyzed  to  71.  A  few  recent  ratings  with  no  history 
in  our  data  base  were  left  out,  as  they  were  a  special  case  and 
quite  few  in  number.  The  following  table  shows  the  definition  of 
ratings  used  for  the  study,  with  the  actual  rating  codes  included 
in  each  of  our  ratings. 

With  the  ratings  as  defined  above,  a  partition  or  clustering 
of  S  is  a  set  of  subsets  of  S  for  which 

C,  H  c .  =  0  if  k^j 
k  j  J 

uc.  =  s 

k  K 


If  there  are  m  subsets  (k=l , . . . ,m) ,  the  partition  is  said 
to  be  of  size  m.  Many  partitions,  suggested  primarily  by  the 
hierarchical  clustering  method,  were  evaluated  by  a  method  des¬ 
cribed  below. 

This  research  investigation  was  conducted  for  the  express 
purpose  of  finding  out  if  the  prediction  of  losses  by  forecasting 
loss  rates  could  be  improved  when  data  was  pooled  among  ratings  in 
clusters,  for  some  systematically  well-defined  clustering.  The 
approach  was  to  forecast  losses  by  a  method  approximating  the  one 
actually  used,  and  for  which  the  clustering  was  originally  intended. 
The  forecasting  was  done  for  the  year  1973  (fiscal  year) ,  using 
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RATINGS  USED  IN  THE  STUDY 


Index  in  S 

Name  Rating  Codes 

1 

Boatswains  Mate 

100 

2 

Quartermaster 

200 

3 

Signalman 

250 

4 

Operations  Specialist 

300 

5 

Sonar  Technicians 

400, 

401,  404 

6 

Torpedomans  Mate 

500 

7 

Gunners  Mates 

600, 

601,  604 

8 

Gunners  Mate  Technician 

602 

9 

Fire  Control  Technicians 

800, 

801,  802,  8i 

10 

Missile  Technician 

810 

11 

Mineman 

900 

12 

Electronics  Technicians 

1000, 

1001,  1002 

13 

Data  Systems  Technician 

1010 

14 

Instrumentman 

1100 

15 

Opticalman 

1200 

16 

Radioman 

1500 

17 

Communication  Technicians 

1600, 
1633  , 

1611,  1622 
1644,  1655 

18 

Yeoman 

1700 

19 

Legalman 

1701 

20 

Personnelman 

1800 

21 

Data  Processing  Technician 

1900 

22 

Storekeeper 

2000 

23 

Disbursing  Clerk 

2100 

24 

Commissaryman 

2290 

25 

Ships  Serviceman 

2490 

26 

Journalist 

2600 

27 

Postal  Clerk 

2700 

28 

Lithographer 

3100 

29 

Illustrator  Draftsman 

3200 

30 

Musician 

3300 

1666 
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Index  in  S  Name  Rating  Codes 


31 

Seaman  Recruit 

3600 

32 

Machinists  Mate 

3700 

33 

Engineman 

3800 

34 

Machinery  Repairman 

3900 

35 

Boilerman 

4000,4020 

36 

Electricians  Mate 

4100 

37 

Interior  Communication 

Elec . 

4200 

38 

Hull  Technicians 

4300,  4410,  4411, 

4412 

39 

Damage  Control 

4500 

40 

Patternmaker 

4600 

41 

Moulder 

4700 

42 

Fireman  Recruit 

5000 

43 

Engineering  Aid 

5100,  5101,  5102 

44 

Construction  Electrician 

5300,  -1,  -2,  -3, 
-5,  -6 

-4  , 

45 

Equipment  Operator 

5410,  5411,  5412 

46 

Construction  Mechanic 

5500,  5503,  5504 

47 

Builder 

5600,  5601,  5602, 

5603 

48 

Steel  Worker 

5700,  5703,  5704 

49 

Utilitiesman 

5800,  5801,  5802, 
5804 

5803 

50 

Construction  Recruit 

6000 

51 

Aviation  Machinists  Mate 

6200,  6205,  6206 

52 

Aviation  Electronics 

Technician 

6300,  6304,  6306, 

6307 

53 

Aviation  Antisub  Warfare 
Technician 

6310 

54 

Aviation  Ordanceman 

6500 

55 

Aviation  Fire  Control 
Technician 

6520,  6521,  6522 

56 

Air  Controlman 

6600 

57 

Aviation  Boatswains  Mate 

6700,  6704,  6705, 

6706 

58 

Aviation  Electricians  Mate 

6800 

59 

Aviation  Structural  Mechanic 

6900,  6901,  6902, 

6903 

60 

Aircrew  Survival  Equipman 

7000 

18 


Index  in  S 


Names 


Rating  Codes 


61 

Aerographers  Mate 

7100 

62 

Tradevman 

7200 

63 

Aviation  Storekeeper 

7300 

64 

Aviation  Maintenance  Admin 

.7400 

65 

Aviation  Support  Equip. 
Technician 

7500 

66 

Photographers  Mate 

7600 

67 

Photographic  Intelligence 

7700 

68 

Airman  Recruit 

7800 

69 

Hospital  Corpsman 

8000 

70 

Dental  Technician 

8300 

71 

Steward 

8500 

7502, 


7503 


TABLE  3 
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data  in  the  years  1966-72.  Then,  the  predicted  losses  were 
compared  to  the  actual  losses  in  1973.  The  prediction  scheme 
was  not  detailed  enough  to  be  used  for  actually  forecasting 
losses,  and  was  only  intended  to  be  an  evaluation  of  clustering. 
If  clustering  is  to  improve  significantly  the  forecasting  (by  any 
means) ,  then  it  should  improve  forecasting  by  the  elementary 
prediction  scheme  given  below. 

To  evaluate  any  clustering  or  partition  ,  k  =  l,...,m, 
the  following  approach  was  used.  First,  a  projection  of  total 
losses  was  made  for  each  individual  rating  by  projecting  the  loss 
rate,  i.e.,  the  proportion  of  those  on  board  at  the  year's  start 
who  would  be  lost  over  the  year.  Let 

1^  j  =  Inventory  (of  men)  at  the  beginning  of 
year  i ,  in  rating  j . 
j  =  Losses  during  year  i  from  rating  j . 
where  the  indices  are, 

i  =  1,2,. ..,7  for  years  1966,  1967  , . . . , 197 2 
respectively,  and 


j  =  1,2, . . . ,N  . 


The  estimated  loss  rate  in  1973  for  rating  j ,  denoted  , 
was  obtained  from  a  weighted  average  of  the  actual  loss  rates 
in  prior  years.  Specifically, 
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l 

i=l 


7-i 

a 


7 


l 

i=l 


7-i 

a 


where  a  is  a  fixed  weighting  factor,  0  <  a  <  1.  This  estimated 
loss  rate  was  applied  to  the  1973  inventory  I ^ ,  yielding 
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as  the  estimated  loss  from  rating  j  in  1973,  using  no  clustering. 

The  same  prediction  scheme  was  used  with  clustering,  and  both 
predictions  were  compared  to  the  actual  loss.  To  estimate  the 
loss  rate  with  clusters,  let  Cj,  k=  l,2,...,m  be  the  partition  of 
the  ratings  being  considered.  Then,  pooling  data  over  clusters 
gives  the  formula  for  the  common  estimated  loss  rate  of  ratings 
in  cluster  C.  : 


I  a7-i  (  l  L  *  J  I  ) 
1=1  j€c.  1,3  j€c.  1,3 


l . 
3 


l  a 

i=l 


7-i 


for  every  j  e  .  Then  the  estimated  loss  is 


L .  =  l .  •  I  . 

3  3  3 

It  should  be  emphasized  again  that  the  prediction  scheme  used 
here  is  not  intended  to  be  the  best  available  for  the  data  at  hand. 
Our  purpose  is  only  to  evaluate  the  clustering,  by  comparing  loss 
predictions  with  and  without  clustering,  using  the  same  prediction 
scheme  in  both  instances. 

VI.  RESULTS  OF  CLUSTERING  EXPERIMENT 
1 .  Dendrograms . 

Using  the  distance  function  defined  in  Chapter  III,  two 
dendrograms  were  drawn  for  each  of  several  values  of  the  weighting 
factor  p  .  The  two  dendrograms  correspond  to  the  maximum  and  the 
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minimum  metrics,  respectively,  between  clusters  as  defined  in 
Chapter  III.  Figures  3  and  4  show  examples  of  dendrograms  with 
the  minimum  and  maximum  metric  respectively.  An  undersirable 
feature  of  all  dendrograms  with  the  minimum  metric  is,  as  can  be 
seen  in  Figure  3 ,  that  separation  into  clusters  does  not  occur 
until  sets  are  at  a  fairly  close  "distance"  to  each  other.  For 
example,  in  Figure  4,  although  two  clusters  form  at  a  "distance" 
of  15.60,  the  next  separation  into  (three)  clusters  occurs  at 
a  "distance"  of  3.12.  Further  separations  occur  at  very  short 
intervals,  at  "distance"  values  2.25,  1.692,  1.688,  etc.  This 
makes  it  rather  difficult  to  decide  on  the  number  of  clusters  to  be 
used.  In  contrast.  Figure  4  shows  a  typical  dendrogram  with  the 
maximum  metric.  Here  separations  into  clusters  occur  quite 
gradually  at  least  until  about  ten  clusters  have  formed.  Separation 
into  two,  three,  four,  etc.,  clusters  occur  at  the  "distance" 
values  48.7,  29.9,  18.2,  14.3,  9.4,  7.6,  etc.  This  provides  more 
justification  to  choose  e.g.,  four  clusters  rather  than  three  or 
five.  In  choosing  the  appropriate  number  of  clusters  one  must 
consider  that,  while  too  many  clusters  would  defeat  the  purpose 
of  clustering,  too  few  clusters  would  result  in  a  prediction  method 
that  is  too  crude.  For  this  reason  the  proper  choice  is  probably 
be  somewhere  between  three  and  ten  clusters. 

2.  Evaluation  of  Clustering. 

In  order  to  evaluate  the  effectiveness  of  clustering, 
the  prediction  scheme  described  in  Chapter  V  was  devised.  According 


to  this  scheme,  two  estimates,  and  L ^ ,  were  computed  as  predic¬ 

tions  with  and  without  clustering  for  the  losses  in  1973  from 


DENDROGRAM  WITH  MINIMUM  METRIC 


o 


22 


VO 

O 

LO 

<T\ 

• 

• 

• 

• 

• 

LO 

O 

<T\ 

r— 1 

rH 

rH 

rH 

00 

# 

r- 


(N 

VO 


r- 

# 


co 

— 


vo 


o 

II 

Q. 


.  -4- 

♦ 

I 

♦ 

I 

* 

I 


os 

035  3 

11 

•i 

13 

OOtP 

Cl 

%i*\ 

*0 

03  ?t 

4? 

••i 

MS 

CS*>? 

5? 

•  •  * 

SI 

1311 

bl 

yi 

MB 

031 

1 

# 

*  * 

o  i 

C?S? 

45 

• 

•  •  •  i 

10 

003V 

51 

• 

[y\ 

l  3 

oi>; 

5  V 

; 

••  *j 

\A 

0011 

PI 

• 

•• I 

o 

03? 

? 

• 

so 

00E 

V 

• 

v» 

0091 

09 

:**i 

33 

3315 

C33> 

VV 

cs 

m  m 

# 

M3 

■ 

••  • 1 

in 

OOPS 

6V 

; 

••  1 

S3 

039C 

EC 

• 

. ..  i 

Mm 

03bl 

VE 

•••  i 

*4 

C33S 

69 

•  •  •  i 

Hd 

3391 

99 

d0 

0361 

1? 

*  i 

C39V 

OV 

*S 

033? 

?? 

•  •  •  i 

S3 

06?? 

V? 

•••  j 

43 

c:?i 

51 

. . .  i 

7  f 

oovi 

V9 

•••  i 

31 

007  3 

15 

•  •  •  i 

m  a 

0051 

91 

•  ■  i 

13 

03  91 

11 

• 

•  •  i 

143 

?C  9 

P 

• 

1 

*L 

035 

9 

...  1 

PV 

031? 

15 

•  • 

-Lf 

005? 

V  5 

• 

•  •  l 

IV 

COE? 

?  5 

• 

M 

Nd 

0391 

0? 

•  ;; 

•  •  •] 

3» 

0093 

95 

_ ft 

-  Ji^ 

031 1 

?  i 

•  •  • 

•] 

KM 

C  3b 

1  1 

•  •• 

hD 

0C9 

1 

y  o 

031? 

E? 

•J-j. 

...  id 

0311 

1? 

••  1 

Sf 

0051 

59 

••  l 

y  v 

03  E  l 

E? 

•  *  • 

...  1 

*d 

C331 

3? 

*  *  I 

OT 

0011 

19 

nV 

0359 

55 

•  \ 

5f 

C089 

35 

t;*l 

M3 

031V 

9E 

•  •  l 

MS 

05? 

1 

■■  i 

Md 

C33S 

?V 

..  .| 

34 

001V 

IV 

♦ 

♦ 

31 

33 

34 

03  ?V 

O0SV 

C3EE 

11 

61 

CE 

►  •  •  •  * 

♦ 

IS 

03V 

009 

0315 

C3?C 

5 

*  *  *  l 
♦ 

id 

f.S 

aS 

b 

0  V 

IE 

•• 

♦ 

*  *  *  l 

• 

39 

0335 

IV 

MI 

001 1 

VI 

I  ' 

*  •••  I 

or 

039? 

92 

• 

yi 

1H 

03E  V 

9E 

’  • 

3d 

001? 

1? 

•  *  1 
♦ 

n 

M3 

031C 

0355 

9? 

9  V 

...  1 

01 

00?1 

79 

*» 

01t? 

15 

•  I 

S3 

0101 

1  I 

♦ 

13 

id 

COO  1 

010 

?1 

01 

♦ 

V  2 

0015 

f  * 

VO 

LO 


o 


DENDROGRAM  WITH  MAXIMUM  METRIC 


' 


II 

Q. 


23 


OS 

CDS? 

It 

Cl 

oo  :i 

Z  9 

*  * 

DW  7 

r ; 

i  * 

C1P 

r  i 

$3 

CHI 

n 

l  3 

00?  I 

Zl 

m  4 

OOOfe 

to 

1  3 

CD^l 

;  i 

1*D 

iD? 

f 

hi 

(OS 

9 

0? 

0DZ7 

i ; 

h  a 

r  ds  i 

71 

dd 

C09  / 

99 

dO 

ODSl 

1  ? 

7  f 

CDW 

*9 

Hi 

CD99 

O 

*> 

CD  DZ 

z? 

S3 

06?? 

*  ? 

kD 

C0?1 

«1 

1  d 

CDZ  t 

1  9 

S» 

00  Si 

S9 

Of 

con 

1  9 

>v 

CDC  / 

r  7 

►  d 

2D  Dj 

D9 

C0b9 

6  S 

3? 

CDB9 

as 

Dv 

CD?  > 

95 

Pf 

CD  1  9 

l  S 

Cr 

CD  i  * 

VS 

i f 

coca 

?S 

*4  j 

0DR1 

c? 

DO 

tds  •> 

6C 

Tim 

ootc 

C£ 

IS 

GDV 

c 

Id 

008 

fc 

10 

coca 

Ot 

M 

1DU 

fcl 

m  D 

CDZC 

6Z 

ObVZ 

S* 

►.8 

0D1 

T 

t . 

OOCi 

P  9 

3D 

odc; 

5» 

cz; ) 

s; 

1a 

00  09 

M 

33 

oi  a 

S  v 

NA 

CD!  I 

81 

«D 

00? 

? 

>  0 

ODt 

t> 

t  D 

2DD9 

35 

*  D 

ODflS 

b* 

K3 

CD8 1 

f  c 

>»w 

GDbC 

VC 

dd 

C0  0£ 

?v 

Id 

C?L*> 

1  T 

DI 

CDZ  v 

tc 

w3 

C01  9 

KS 

CSZ 

t 

►  h 

ODtr 

re 

Kd 

(06 

1 1 

50 

DD9 

i 

xa 

COX! 

tZ 

«s 

GOJ  S 

f<*» 

t>S 

OD»C 

1  z 

O 

cd;  i 

7V 

ns 

G09S 

19 

» t 

CD1  l 

•  I 

r,r 

0  DD* 

n  y 

Id 

OOf  • 

r-c 

Dd 

CDZ? 

1Z 

1  1 

ODt  ' 

f  * 

f  ? 

r  01  S 

f  *■ 

24 


Rating  j  .  When  the  1973  data  on  losses  became  available,  the 
actual  losses,  Lj  ,  from  Rating  j  became  known.  Histograms 
were  then  prepared  for  the  following  expressions: 

A 

(i)  Lj  -  Lj  =  error  in  prediction  without  clustering, 

(ii)  Lj  -  Lj  =  error  in  prediction  with  clustering, 

(iii)  iL.  -  L.l  -  |l.  -  L.l  =  difference  in  absolute 
3  3  3  3 

errors  without  and  with  clustering. 

(iv)  (L.  -  L.)tL.  =  normalized  error  in  prediction 
3  3  3 

without  clustering 

(v)  (Lj  -  Lj)rLj  =  normalized  error  in  prediction 
with  clustering 

(vi)  ( | L j  -  £ j | - | L j  -  Lj | ) t  L j  =  difference  in  absolute 

normalized  errors  without  and  with 
clustering. 

The  histograms  were  specifically  examined  for  cases  where  the 
number  of  clusters  was  3,  5,  7,  10,  15  and  20. 

The  proper  choice  of  value  for  p  ,  the  parameter  used  to 
weight  past  years  according  to  importance  in  the  clustering 
scheme  was  also  investigated.  The  value  of  p  could  be  based 
on  empirical  data  considerations.  For  example,  since  0  £  p  £  1  , 
the  larger  the  value  of  p  the  more  emphasis  is  placed  on  recent 
years  in  the  data  base.  In  this  study  the  value  of  p  to  employ 
was  based  only  on  its  effect  on  clustering.  Figure  5  shows  at 
what  level  of  the  distance  scale  various  numbers  of  clusters 
formed  as  the  value  of  p  is  changed.  This  Figure  suggests 
that  in  the  vicinity  of  p  =  .1  ,  the  points  on  the  distance 
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scale  where  clusters  form  are  better  separated  from  each  other 
than  is  the  case  for  other  values  of  p  . 

The  choice  of  value  for  a  ,  the  parameter  that  weight  past 
years  according  to  their  importance  in  the  prediction  scheme, 
was  not  specifically  investigated.  It  seemed  natural  to  assume 
that  a  =  p  .  However,  there  could  be  convincing  arguments  for 
choosing  a  different  from  p  . 

Among  the  types  of  histograms  listed  above,  item  (vi)  was 
the  most  relevant  for  the  evaluation  of  clustering.  The  "difference 
is  absolute  normalized  errors  without  and  with  clustering"  measures 
the  relative  success  of  clustering  in  predicting  future  losses 
versus  the  success  of  doing  that  by  a  comparable  traditional 
method.  A  large  number  of  ratings  having  positive  values  for  this 
measure,  especially  large  positive  values,  would  indicate  signi¬ 
ficant  success  of  clustering.  A  high  percentage  of  ratings  on 
the  negative  side  would  suggest  the  opposite  conclusion.  The 
actual  result,  however,  were  not  conclusive  either  way.  A  typical 
histogram  is  shown  in  Figure  6  for  the  case  is  p  =  .1  and  seven 
clusters.  The  mean  and  median  as  in  most  other  such  histograms 
are  moderately  negative,  indicating  that  the  clustering  was 
slightly  disadvantageous.  As  more  and  more  clusters  are  used  the 
histograms  become  concentrated  at  the  origin  which  is  to  be 
expected,  as  using  many  clusters  is  practically  equivalent  to  no 
clustering  at  all.  The  choice  of  p  did  not  seem  to  effect  this 
result  a  great  deal,  although  the  choice  of  p  =  .5  appeared 
to  be  slightly  more  favorable  to  the  clustering  method.  Figure  7 
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shows  the  histogram  corresponding  to  the  case  p  =  .5  and  seven 
clusters . 

The  fact  that  the  clustering  method  resulted  in  somewhat 
bigger  (absolute  normalized)  errors  than  the  standard  predicting 
method  does  not  render  clustering  totally  worthless.  Since  in 
comparison  the  two  methods  achieve  a  nearly  identical  measure  of 
success,  the  clustering  method  may  have  its  advantages  in  shortening 
the  data  processing  procedures  when  clustering  is  used.  This  may 
be  a  more  relevant  factor  when  the  forcasting  technique  is  not  of 
the  simple  variety  described  here,  but  instead  is  a  more  complex 
one  such  as  used  in  FAST  described  in  [2] ,  [4]  and  [5] . 

The  histograms  presented  above  do  not  show  the  size  of 
errors  made  by  either  the  clustering  or  the  standard  forcasting 
method.  The  histogram  presented  in  Figure  8  exhibits  the  size  of 
the  normalized  errors  when  forcasting  by  clustering  (item  (V) 
above)  for  the  case  p  =  J.  and  seven  clusters.  The  horizontal 
scale  is  in  percentage.  The  Figure  shows  that  58  of  the  71 
ratings  had  a  less  than  25%  (positive  or  negative)  error.  For  one 
rating  the  error  is  shown  as  -100%.  This  is  due  to  a  rating 
(Legalman)  for  which  there  were  zero  losses  in  1973,  while  the 
clustering  method  forecasted  464.  Since  the  zero  loss  in  1973  is 
probably  due  to  a  data  processing  error,  this  large  forcasting 
error  seems  forgivable. 

The  histograms  presented  here  are  representive  of  the  many 
more  cases  which  were  tried.  The  results  in  every  case  were 
essentially  the  same,  namely  one  of  indifference  to  clustering 
the  data  for  loss  rate  prediction.  The  number  of  subsets  in  a 
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was  explored,  as  well  as  the  choice  of  the  parameters  p  and 
a  .  The  numerous  dendrograms  and  histograms  produced  from  these 
experiments  remain  intact  with  the  authors. 

A  by-product  of  this  project  is  the  identification  of  subsets 
of  ratings  with  common  loss  behavior.  Such  a  grouping  of  ratings 
would  for  example,  sugges£guidelines  for  the  application  of  personnel 
policy  to  select  groups  of  ratings.  Other  applications  could  be 
explored  as  well  by  simply  changing  the  criterion  by  which  ratings 
are  judged  to  be  close  to  each  other.  Then  groupings  of  ratings 
could  quickly  and  easily  be  identified,  based  on  another  charac¬ 
teristics  of  behavior  besides  loss  from  the  service. 
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